The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
基于激光传感器的同时定位和映射(SLAM)已被移动机器人和自动驾驶汽车广泛采用。这些大满贯系统需要用有限的计算资源来支持准确的本地化。特别是,点云注册,即,在全球坐标框架中在多个位置收集的多个LIDAR扫描匹配和对齐的过程被视为SLAM的瓶颈步骤。在本文中,我们提出了一种功能过滤算法Pfilter,可以过滤无效的功能,因此可以大大减轻这种瓶颈。同时,由于精心策划的特征点,总体注册精度也得到了提高。我们将PFILTER集成到公认的扫描到映射激光射击轨道框架F-LOAM,并评估其在KITTI数据集中的性能。实验结果表明,pfilter可以删除本地特征图中约48.4%的点,并将扫描中的特征点平均减少19.3%,从而节省每帧的处理时间20.9%。同时,我们将准确性提高了9.4%。
translated by 谷歌翻译
在人工智能和音乐领域中,从歌词中产生旋律是一项有趣而又具有挑战性的任务。但是,保持输入歌词和生成旋律之间的一致性的困难限制了以前作品的发电质量。在我们的建议中,我们演示了我们提出的可解释的歌词到循环的生成系统,该系统可以与用户互动以了解生成过程并重新创建所需的歌曲。为了提高与歌词匹配的旋律生成的可靠性,相互利用以增强歌词和生成的旋律之间的一致性。利用Gumbel-Softmax来解决通过生成对抗网络(GAN)生成离散音乐属性的非差异性问题。此外,发电机的预测概率输出用于推荐音乐属性。与我们的歌词到旋律生成系统互动,用户可以收听生成的AI歌曲,并通过从推荐的音乐属性中选择来重新创建新歌。
translated by 谷歌翻译
具有平均社会认知水平的人类可以仅根据非语言交流信号(例如,目光,手势,姿势和上下文信息)来推断他人的信念。这种预测人类信念和意图的社会认知能力对于确保安全的人类机器人互动和协作比以往任何时候都更为重要。本文使用了心理理论(TOM)和对象文本关系的结合知识来研究在禁止语言交流的环境中增强人与自主系统之间协作的方法。我们提出了一个新颖而富有挑战性的多模式视频数据集,用于评估人工智能(AI)系统在对象文化场景中预测人类信念状态方面的能力。所提出的数据集包括对人类信念的精确标记状态基地真实和​​多模式输入,这些输入复制了人类感知捕获的所有非语言交流输入。我们通过现有的深度学习模型进一步评估数据集,并提供有关各种输入模式和对象语言关系对基线模型性能的影响的新见解。
translated by 谷歌翻译
实际上,寻求帮助通常比搜索整个空间更有效,以找到一个未知位置的对象。我们提出了一个学习框架,该框架使代理商能够在此类具体的视觉导航任务中积极寻求帮助,其中反馈将其视为目标的位置。为了模仿老师可能并不总是在场的现实情况,我们提出了一项培训课程,而反馈并不总是可用。我们制定了目标的不确定性度量,并使用经验结果表明,通过这种方法,代理商将在没有反馈时保持有效的帮助,同时保持强大的帮助。
translated by 谷歌翻译
有时将儿童的认知能力视为AI基准。在自然主义儿童的环境中,如何学习最常见的1,000个概念(每天使用的89%)?儿童的认知发展是关于质量的,可以通过简单的例子传达新概念。我们的知识脚手架方法使用简单的对象和动作来传达概念,例如如何教授孩子。我们介绍了ABCDE,这是一种以典型的儿童游戏室为基础的交互式3D环境。它带有300多个唯一的3D对象资产(主要是玩具),以及一个宽敞的动作空间,可供孩子和父代理与对象互动。ABCDE是旨在模仿儿童认知发展的自然主义环境的第一个环境。没有其他环境通过学习者的互动来研究高级概念学习。可以在https://pypi.org/project/abcdesim/1.0.0/上找到模拟器
translated by 谷歌翻译
虽然注释大量的数据以满足复杂的学习模型,但对于许多现实世界中的应用程序可能会过于良好。主动学习(AL)和半监督学习(SSL)是两个有效但经常被隔离的方法,可以减轻渴望数据的问题。最近的一些研究探索了将AL和SSL相结合以更好地探测未标记数据的潜力。但是,几乎所有这些当代的SSL-AL作品都采用了简单的组合策略,忽略了SSL和AL的固有关系。此外,在处理大规模,高维数据集时,其他方法则遭受高计算成本。通过标记数据的行业实践的激励,我们提出了一种基于创新的基于不一致的虚拟对抗性积极学习(理想)算法,以进一步研究SSL-AL的潜在优势,并实现Al和SSL的相互增强,即SSL,即SSL宣传标签信息,以使标签信息无标记的样本信息并为Al提供平滑的嵌入,而AL排除了具有不一致的预测和相当不确定性的样品。我们通过不同粒度的增强策略(包括细粒度的连续扰动探索和粗粒数据转换)来估计未标记的样品的不一致。在文本和图像域中,广泛的实验验证了所提出的算法的有效性,并将其与最先进的基线进行了比较。两项实际案例研究可视化应用和部署所提出的数据采样算法的实际工业价值。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
用于图像文本生成任务的传统方法主要是分别解决自然双向生成任务,专注于设计任务特定的框架以提高所生成的样本的质量和保真度。最近,Vision-Language预训练模型大大提高了图像到文本生成任务的性能,但仍未开发出用于文本到图像综合任务的大规模预训练模型。在本文中,我们提出了一个具有变压器模型的双向图像文本生成的统一生成的预训练框架的Ernie-Vi​​lg。基于图像量化模型,我们将图像生成和文本生成标准为在文本/图像输入上调节的自回归生成任务。双向图像文本生成建模简化了视觉和语言的语义对齐。对于文本到图像生成过程,我们进一步提出了端到端的训练方法,共同学习视觉序列发生器和图像重建。为了探讨双向文本图像生成的大规模预培训景观,我们在大规模数据集中培训了100亿参数的Ernie-Vi​​lg模型,以145百万(中文)图像 - 文本对实现了达到的状态 - 文本到图像和图像到文本任务的最佳性能,以便在MS-Coco上获取7.9的FID,用于文本到图像合成以及用于图像标题的Coco-CN和AIC-ICC的最佳结果。
translated by 谷歌翻译